AITopics | temporal modeling

Recurrent Ladder Networks

Neural Information Processing SystemsMar-17-2026, 12:29:48 GMT

We propose a recurrent extension of the Ladder networks whose structure is motivated by the inference required in hierarchical latent variable models. We demonstrate that the recurrent Ladder is able to handle a wide variety of complex learning tasks that benefit from iterative inference and temporal modeling. The architecture shows close-to-optimal results on temporal modeling of video data, competitive results on music modeling, and improved perceptual grouping based on higher order abstractions, such as stochastic textures and motion cues.

artificial intelligence, machine learning, proceedings, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.80)

Add feedback

More Is Less: Learning Efficient Video Representations by Big-Little Network and Depthwise Temporal Aggregation

Quanfu Fan, Chun-Fu (Richard) Chen, Hilde Kuehne, Marco Pistoia, David Cox

Neural Information Processing SystemsFeb-11-2026, 23:36:07 GMT

Neural Information Processing Systems http://nips.cc/

action recognition, architecture, recognition, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

c6e954799a0218f6d341ad5cbfb58999-Paper-Conference.pdf

Neural Information Processing SystemsFeb-11-2026, 20:49:15 GMT

Invideo recognition, weneedtosample multiple frames torepresent eachvideo which makesthe computational cost scale proportionally to the number of sampled frames. In most cases, a small proportion of all the frames is sampled for each input, which only contains limited information of the original video.

afnet, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.69)

Add feedback

5820ad65b1c27411417ae8b59433e580-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-9-2026, 03:46:47 GMT

dealignment, encoder block, temporal modeling, (12 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.50)

Add feedback

5820ad65b1c27411417ae8b59433e580-Paper-Conference.pdf

Neural Information Processing SystemsFeb-9-2026, 03:46:44 GMT

information, mutual information, representation, (16 more...)

Neural Information Processing Systems

Country:

Asia (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Exploiting Spatiotemporal Properties for Efficient Event-Driven Human Pose Estimation

Zhou, Haoxian, Xu, Chuanzhi, Chen, Langyi, Chen, Haodong, Chung, Yuk Ying, Qu, Qiang, Chen, Xaoming, Cai, Weidong

arXiv.org Artificial IntelligenceDec-9-2025

Human pose estimation focuses on predicting body keypoints to analyze human motion. Event cameras provide high temporal resolution and low latency, enabling robust estimation under challenging conditions. However, most existing methods convert event streams into dense event frames, which adds extra computation and sacrifices the high temporal resolution of the event signal. In this work, we aim to exploit the spatiotemporal properties of event streams based on point cloud-based framework, designed to enhance human pose estimation performance. We design Event Temporal Slicing Convolution module to capture short-term dependencies across event slices, and combine it with Event Slice Sequencing module for structured temporal modeling. We also apply edge enhancement in point cloud-based event representation to enhance spatial edge information under sparse event conditions to further improve performance. Experiments on the DHP19 dataset show our proposed method consistently improves performance across three representative point cloud backbones: PointNet, DGCNN, and Point Transformer.

artificial intelligence, machine learning, representation, (18 more...)

arXiv.org Artificial Intelligence

2512.06306

Country: Asia > China (0.14)

Genre: Research Report (0.82)

Industry: Information Technology (0.55)

Technology:

Information Technology > Artificial Intelligence > Vision > Video Understanding (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Recurrent Ladder Networks

Neural Information Processing SystemsNov-21-2025, 14:16:25 GMT

We propose a recurrent extension of the Ladder networks whose structure is motivated by the inference required in hierarchical latent variable models. We demonstrate that the recurrent Ladder is able to handle a wide variety of complex learning tasks that benefit from iterative inference and temporal modeling. The architecture shows close-to-optimal results on temporal modeling of video data, competitive results on music modeling, and improved perceptual grouping based on higher order abstractions, such as stochastic textures and motion cues.

electronic proceedings, name change, recurrent ladder network, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.80)

Add feedback

Recurrent Ladder Networks

Isabeau Prémont-Schwarz, Alexander Ilin, Tele Hao, Antti Rasmus, Rinu Boney, Harri Valpola

Neural Information Processing SystemsNov-21-2025, 04:23:22 GMT

Many cognitive tasks require learning useful representations on multiple abstraction levels. Hierarchical latent variable models are an appealing approach for learning a hierarchy of abstractions.

artificial intelligence, machine learning, rladder, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > New Jersey > Hudson County > Secaucus (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

TiS-TSL: Image-Label Supervised Surgical Video Stereo Matching via Time-Switchable Teacher-Student Learning

Wang, Rui, Zhou, Ying, Wang, Hao, Zhang, Wenwei, Li, Qiang, Wang, Zhiwei

arXiv.org Artificial IntelligenceNov-13-2025

Stereo matching in minimally invasive surgery (MIS) is essential for next-generation navigation and augmented reality. Yet, dense disparity supervision is nearly impossible due to anatomical constraints, typically limiting annotations to only a few image-level labels acquired before the endoscope enters deep body cavities. Teacher-Student Learning (TSL) offers a promising solution by leveraging a teacher trained on sparse labels to generate pseudo labels and associated confidence maps from abundant unlabeled surgical videos. However, existing TSL methods are confined to image-level supervision, providing only spatial confidence and lacking temporal consistency estimation. This absence of spatio-temporal reliability results in unstable disparity predictions and severe flickering artifacts across video frames. To overcome these challenges, we propose TiS-TSL, a novel time-switchable teacher-student learning framework for video stereo matching under minimal supervision. At its core is a unified model that operates in three distinct modes: Image-Prediction (IP), Forward Video-Prediction (FVP), and Backward Video-Prediction (BVP), enabling flexible temporal modeling within a single architecture. Enabled by this unified model, TiS-TSL adopts a two-stage learning strategy. The Image-to-Video (I2V) stage transfers sparse image-level knowledge to initialize temporal modeling. The subsequent Video-to-Video (V2V) stage refines temporal disparity predictions by comparing forward and backward predictions to calculate bidirectional spatio-temporal consistency. This consistency identifies unreliable regions across frames, filters noisy video-level pseudo labels, and enforces temporal coherence. Experimental results on two public datasets demonstrate that TiS-TSL exceeds other image-based state-of-the-arts by improving TEPE and EPE by at least 2.11% and 4.54%, respectively.

artificial intelligence, machine learning, pseudo label, (13 more...)

arXiv.org Artificial Intelligence

2511.06817

Country: Asia > China (0.14)

Genre: Research Report (1.00)

Industry:

Education (1.00)
Health & Medicine > Surgery (0.46)
Health & Medicine > Health Care Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Video Flow as Time Series: Discovering Temporal Consistency and Variability for VideoQA

Song, Zijie, Hu, Zhenzhen, Ma, Yixiao, Li, Jia, Hong, Richang

arXiv.org Artificial IntelligenceNov-4-2025

--Video Question Answering (VideoQA) is a complex video-language task that demands a sophisticated understanding of both visual content and temporal dynamics. Traditional Transformer-style architectures, while effective in integrating multimodal data, often simplify temporal dynamics through positional encoding and fail to capture non-linear interactions within video sequences. In this paper, we introduce the T emporal Trio Transformer (T3T), a novel architecture that models time consistency and time variability. The TS module employs Brownian Bridge for capturing smooth, continuous temporal transitions, while the TD module identifies and encodes significant temporal variations and abrupt changes within the video content. The efficacy of the T3T is demonstrated through extensive testing on multiple VideoQA benchmark datasets. Our results underscore the importance of a nuanced approach to temporal modeling in improving the accuracy and depth of video-based question answering. In the realm of video-language tasks, Video Question Answering (VideoQA) stands out as one of the challenges that demand a high degree of temporal understanding where video and language are both sequential forms of information characterized by their temporality. This task requires models not only to process visual content but also to reason across the temporal sequence of events in a video in response to specific questions [1]-[4].

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ICME59968.2025.11209469

2504.05783

Country: Asia > China (0.29)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Filters

Collaborating Authors

temporal modeling

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Recurrent Ladder Networks

More Is Less: Learning Efficient Video Representations by Big-Little Network and Depthwise Temporal Aggregation

c6e954799a0218f6d341ad5cbfb58999-Paper-Conference.pdf

5820ad65b1c27411417ae8b59433e580-Supplemental-Conference.pdf

5820ad65b1c27411417ae8b59433e580-Paper-Conference.pdf

Exploiting Spatiotemporal Properties for Efficient Event-Driven Human Pose Estimation

Recurrent Ladder Networks

Recurrent Ladder Networks

TiS-TSL: Image-Label Supervised Surgical Video Stereo Matching via Time-Switchable Teacher-Student Learning

Video Flow as Time Series: Discovering Temporal Consistency and Variability for VideoQA